Team:NYU Abu Dhabi/Documentation/DOCS 20ee279bfcdc46b09c4fb108851b2757/Data 41f50054231042beb0e04e4ab748f256/Data and Infectious Diseases ad58c6bc3a5b4d62b6c7c749e9eee226

Data and Infectious Diseases

Data and Infectious Diseases

@Prajjwal Bhattarai

Data Sources

Data sources for infectious diseases mainly focus on biosurveillance in humans, which can be either in the form of lab-tested records (Flunet) or aggregated web-scraped/ user-submitted records (ProMed, HealthMap). These are tailored for general awareness to the public and alertness for growing epidemics and do not include distinct sample details. While these databases have limited data regarding animals (even lesser so to amphibians), the design and implementation details can be applied, if we need to make our own database. Daughton, 2017

For amphibians, Amphibian Disease Portal is a disease based resource directly applicable to our project. The data is collected as samples, tested for bd or bsal. At the minimum, each record has geolocation, collected year, genus, and the prevalence of the disease. They have about 30 k samples, with a disproportionate 25k from North America.

Amphibianweb and AmphiBIO can be useful sources of amphibian traits related to ecology, morphology, and reproduction, which may be required for analysis and modeling.

Contribution

Amphibianweb uses Geome-db as a backend. Geome-db allows users to submit sample data, which in our case, will be used by Amphibianweb. A sample submission to geome-db requires the following minimum specifications:

  • Location in lattidue/longitude coordinates
  • Time collected
  • genus
  • Record basis (Live/Deceased Organism, Fossil samples etc)
  • Sample type (Swabbing, tissue, blood etc)
  • Disease tested and detected

Integration with the diagnostic device could be possible with a script to automate the process of uploading the sample data to geome-db, however it depends upon whether or not the device has computational resources. It would also require a GPS chip for the coordinates and a method to determine the sample's genus (whether entered by the user or through other means). A deep learning model based to predict the genus based on the picture of the amphibian could be implemented, based on Hansen, 2019 which implements in on pictures of insects, however, its accuracy and applicability for amphibians are uncertain.

Making an in-house database with the diagnostics tests logged to the database, and then contributing it to the external sources is also viable. We will still have the requirements in the device as above, but do not need access to the internet. Possible database design can be taken from Deck, 2017 for sample-based design and Daughton, 2017 for biosurveillance oriented design.

Visualization & Exploratory Tools

UCDavis Bioportal is an example of a visualization tool for an infectious disease that is specifically built to analyze the trend and predict an endemic.

Amphibian Disease Portal as well as geome-db have the same front-end for data visualization and querying, which is currently, it is very basic and has a simple overlay of the location of samples in the database with a map of the world. We can make a better, more interactive visualization tool using R and Shiny. Based solely off the combination of data available from geome, amphibianweb and amphibio the following visualizations can be added:

  • A time based graph showing the spread of the disease
  • A phylogenic tree of the species that are getting affected or of the pathogen if the additional genetic information is available in the geom-db sample
  • Searching using additional parameters like weather conditions and ecological traits

In addition, endemic modeling, risk assessment, and simulation tools can be added. Heslop,2018 contains a list of public, open-source tools of a similar purpose-built with human endemics in mind. By tweaking some of the variables, we can implement it in our visualization tools. Similarly, a hypothesis testing model from Adams, 2010 can be integrated, which can help determine the statistical likelihood of false negatives of negative samples.

Analysis, modeling and prediction

The limited disease-related data from Amphibian Disease Portal presents a significant problem in any analysis, modeling, or prediction. Olson, 2013 performed a correlative analysis of the spread of bd with various environmental factors like temperature, precipitation, and ecological habitat based on the data of bd-maps.net (a predecessor to Amphibian Disease Portal). A possible replication of this analysis with more sophisticated models could show differing results. The heavy data bias from North America will prevent a challenge in generalizing the results to a wider geographical area.

A few works have been done in modeling an endemic, but mostly for isolated species. Drawert,2017 uses a deterministic as well as a stochastic model to simulate a bd endemic in an isolated species in the Sierra Nevada mountains, ultimately with a purpose to pick the best conservation strategy. The controlled, isolated environment is a big limiting factor on the general adoption of this model.

Follow up questions

How/why do they do genus assessment right now?

How

  • The strict definition of the genus seems to be not strictly codified and based on various ecological, morphological, and genetic factors. Even for amphibians, a codified heuristic that is universal amongst all of the amphibian species is hard to find.
  • The higher-level division is based on the easily observable difference (frogs/toads vs salamanders), then factors like common physical characteristics, DNA sequences, or any molecular evidence that can be grouped together as the same phylogenetic group.
  • As a byproduct genus classification will be correlated with geography, habitat and temperature, but a generalized codified system is not possible

Sources: AmphibianWeb, List of Amphibian Names

Why

  • Most of the modeling and prediction done with amphibian disease data seem to be restricted to one genus or species. So, I think genus is important high-level metadata that can be used to filter and group the data and see differing conclusions on analysis

Is there a need for visualization and data exploration?

  • Ask the researchers who are involved with amphibian research and have worked with amphibiandisease.org before.

Contact details

People to email spreadsheet

Email draft

Should we attempt analysis, modeling and prediction?

  • Ask Dr Penner about issues with generalization across species
  • Could be tailored towards conservation agencies to pick the best conservation strategy. (Genus would be important in this case) Drawert,2017
  • Simulations (that do not necessarily have great predictive power) can be an experimental addition to the visualization tool. For instance, Dur-e-Ahmad, 2014 accounts for exposure to bd from the larvae stage and accounts for death from bd as well as reproduction based on initial parameters. This may be useful in setting a tolerable bd quantity in a population.

Questions for Dr. Penner

  • How much of a dealbreaker is the lack of genus in the usability of our data?
  • Visualization questions
  • Are generalized models for all amphibians useful? How much additional variables will we need to include in our model until it's useful beyond just experimentation
  • Questions about stochastic models and their interpretability. How useful will these models be actually picking a valid conservation strategy